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ABSTRACT 

To date, the radial velocity (RV) method has been one of the most productive techniques 
for detecting and confirming extrasolar planetary candidates. Unfortunately, stellar activity 
can induce RV variations which can drown out or even mimic planetary signals - and it is 
notoriously difficult to model and thus mitigate the effects of these activity-induced nuisance 
signals. This is expected to be a major obstacle to using next-generation spectrographs to 
detect lower mass planets, planets with longer periods, and planets around more active stars. 
Enter Gaussian processes (GPs) which, we note, have a number of attractive features that 
make them very well suited to disentangling stellar activity signals from planetary signals. 
We present here a GP framework we developed to model RV time series jointly with ancillary 
activity indicators (e.g. bisector velocity spans, line widths, chromospheric activity indices), 
allowing the activity component of RV time series to be constrained and disentangled from 
e.g. planetary components. We discuss the mathematical details of our GP framework, and 
present results illustrating its encouraging performance on both synthetic and real RV datasets, 
including the publicly-available Alpha Centauri B dataset. 
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1 INTRODUCTION 

The radial velocity (RV) method, a.k.a. Doppler spectroscopy, has 
been one of the most productive methods for discovering exoplan¬ 
ets. To date, more than 500 confirmed planets^ have been discov¬ 
ered using the RV method, and until quite recently, more planets 
had been discovered with the RV method than with any other tech¬ 
nique. Though more planets have now been discovered by transit 
photometry - thanks to the unparalleled success of NASA’s Kepler 
observatory in recent years (Batalha et al. 2013; Marcy et al. 2014) 
- the RV method remains indispensable both in its own right for 
discovering planets, and moreover for confirming and helping to 
characterise candidate planets detected by other means, including 
transit photometry. 

Thanks to a number of technical developments, the precision 
of RV surveys has been constantly improving. Whereas the spec¬ 
trographs of fifty years ago produced RV measurements with er¬ 
rors in excess of 1 km (nominal precision per measurement), 
today’s cutting-edge, so-called “second generation’’ spectrographs 
can identify radial-velocity shifts down to a few tens of centime- 
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tres per second - enough to locate many rocky, Earth-like planets 
orbiting close to their host stars (Pepe et al. 2011; Lovis & Fischer 
2011). Examples of such spectrographs include the HARPS (High 
Accuracy Radial Velocity Planet Searcher) spectrograph hosted by 
the ESO 3.6 m telescope in Chile; its Northern Hemisphere com¬ 
ponent, HARPS-N, on the Italian 3.58 m Telescopio Nazionale 
Galileo telescope on La Palma Island; and HIRES (High Resolu¬ 
tion Echelle Spectrograph) at the Keck telescopes in Hawaii. Third 
generation spectrographs, to come online in the next few years, are 
expected to have measurement errors below 10 cm s“^(Pasquini 
et al. 2008; Pepe et al. 2010). 

This makes it possible, in principle, to discover exoplanets 
with increasingly low masses and increasingly longer periods, po¬ 
tentially reaching down to the Earth-mass and/or habitable regimes. 
This has motivated intensive, multi-season monitoring campaigns 
on current state-of-the-art instruments, and results from these cam¬ 
paigns are in turn bolstering the science cases for next-generation 
instruments to be deployed on the world’s largest telescopes. 

One announcement in particular caused a significant stir in 
the exoplanet community: that of a planet around the star a Cen B 
with an orbital period of ~ 3.24 days, and a minimum mass sim¬ 
ilar to that of the Earth, discovered using the HARPS instrument 
(Dumusque et al. 2012). If confirmed, this discovery would be par¬ 
ticularly remarkable not only because the minimum mass of the 
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planet is so low (the lowest of any planet around a Sun-type star), 
but also because the host star is the nearest Sun-like star to our 
own system, and the RV semi-amplitude reportedly induced by the 
planet (50 cm s“^) is the smallest to have been measured to date. 

Unfortunately, though we live in an age of RV instruments 
with ever-improving precisions, better instrumentation has meant 
that exoplanet hunters are also being increasingly confronted by a 
significant obstacle: the contaminating presence of RV signals from 
stars themselves, which is arguably the most important source of 
noise (or rather nuisance signal) in modern RV datasets. 

This, then, establishes the broad context for the present work, 
the focus of which is a Gaussian process framework we developed 
for modelling the activity-induced variations in RV time series, in 
order to disentangle activity from e.g. planetary contributions. 

The balance of our paper is structured as follows. In the next 
section (Section 2), we discuss these stellar nuisance signals in 
some more detail; we note that signals related to stellar activity are 
particularly difficult to mitigate. In Section 3 we present our new 
framework for modelling stellar activity signals jointly with pos¬ 
sible planetary signals, and discuss the potential advantages and 
shortcomings of our new framework. In Section 4, we demonstrate 
the successful application of our new framework to both simulated 
and real datasets, including the publicly available a Cen B dataset. 
Finally, we present our conclusions in Section 5, and discuss future 
plans for extending this work. 


2 STELLAR NUISANCE SIGNALS 

In order of increasing associated time-scales, the main physical 
phenomena in stars giving rise to RV perturbations are the follow¬ 
ing (see e.g. Dumusque 2012 for a detailed overview). 

(i) Oscillation of the external envelopes in Sun-like stars. Inter¬ 
ference of multiple p-mode oscillations can introduce RV variations 
on the order of tens of centimetres per second, over time-scales 
of minutes (Schrijver & Zwaan 2000; O’Toole et al. 2008; Michel 
et al. 2008). 

(ii) Granulation phenomena, driven by convective flows in the 
external layers of Sun-like stars. Time-scales can range from sev¬ 
eral minutes to up to two days (in the case of super-granulation), 
with associated disk-integrated RV signals on the order of meters 
per second (Dravins 1987; Michel et al. 2008; Mathur et al. 2011). 

(iii) Rotationally-modulated phenomena associated with active 
regions - dark spots and bright plages and faculae - which move 
across the observed stellar disk and break the flux balance between 
the redshifted and blueshifted halves of the disk (Dumusque et al. 
2011; Boisse et al. 2012). The major contribution to RV perturba¬ 
tions arises not directly from the presence of the spots and plages 
themselves, but from the strong magnetic fields associated with 
these active regions, which in turn lead to a suppression of the 
net blueshift normally associated with convective cells (Meunier 
et al. 2010). The strength of associated RV perturbations depends 
on rotation and surface activity levels, and can be as high as tens of 
meters per second (Saar & Donahue 1997); observations suggest 
that even chromospherically-quiet stars can be expected to have 
activity-induced RV signals on the order of 1-2 m (Isaacson 
& Fischer 2010). The associated time-scales are linked to the rota¬ 
tion period of the star, which can range from several hours to days 
or even weeks, as is the case for the Sun (Tassoul 2000; Nielsen 
etal. 2013). 

(iv) Long-term magnetic activity cycles. In Sun-like stars with 
long-term magnetic cycles, long-term modulations of activity levels 


and numbers of surface magnetic regions can lead to variations in 
stellar-origin RV perturbations of up to tens of meters per second 
over the course of many years (Gomes da Silva et al. 2012; Meunier 
& Lagrange 2013). 

While observational strategies can be devised to mitigate nui¬ 
sance RV signals associated with stellar oscillation and granula¬ 
tion (Dumusque et al. 2011), activity-induced signals pose a big¬ 
ger challenge. Apart from the larger amplitude associated with 
these signals, their periodic or quasi-periodic nature means they can 
sometimes mimic planetary RV signals (e.g. Desidera et al. 2004, 
Bonfils et al. 2007; Carolo et al. 2014; Haywood et al. 2014; San¬ 
tos et al. 2014). Moreover, the associated time-scales of these stellar 
signals can be comparable to those expected for planets: on the or¬ 
der of days or weeks for rotationally-modulated activity, and on the 
order of years for periodic modulations due to long-term activity 
cycles. Therefore, in the regime of small-amplitude RV variations 
associated with lower-mass planets, planetary detection and char¬ 
acterisation rapidly becomes limited not by the quality of available 
spectrographs, but rather by nuisance signals which can be very dif¬ 
ficult to disentangle from putative planetary signals. The Earth, for 
example, induces a reflex motion in the Sun of about 0.09 m s“^; 
however, though such a small RV signal could in principle be de¬ 
tected with next-generation spectrographs, in practice it would be 
masked by larger, activity-induced signals from the sun (Meunier 
et al. 2010; Dumusque 2012). 

There has thus been considerable interest, in recent years, in 
developing ways to mitigate the impact of activity, using a variety 
of methods ranging from exploiting simple correlations between 
the radial velocity measurements and chromospheric activity or line 
shape indicators (Boisse et al. 2009; Tuomi et al. 2014; Robertson 
& Mahadevan 2014), pre-whitening of the RV time series (Queloz 
et al. 2009), and using red-noise models (Tuomi et al. 2013; Feroz 
& Hobson 2014), to fitting sine waves at the rotational period of the 
star and its harmonics (Boisse et al. 2011; Dumusque et al. 2012) 
and detailed modelling of stellar surface features (Lanza et al. 2010; 
Boisse et al. 2012). Each of these approaches, though, has draw¬ 
backs. For example, since activity variations cannot be expected to 
be strictly periodic, modelling them using sinusoids (which are not 
orthonormal in the case of unevenly sampled data) based on a naive 
periodogram analysis can introduce harmonics of the subtracted si¬ 
nusoids into the data, and bias one way from genuine signals in the 
data (Tuomi et al. 2014). 

Aigrain et al. (2012, hereafter A12) showed that there should 
exist a relatively simple relationship between the photometric 
brightness and radial velocity variations of a spotted star. This leads 
to a straight-forward method to predict the RV variations from 
the light curve, with only two free parameters. Because it uses 
the stellar flux and its time derivative, this method was named the 
FF" method, but it can be applied only when simultaneous high- 
precision photometric and RV measurements are available. 

When this is not the case, information can still be gained by 
modelling the RV time series jointly with other parameters, which 
are extracted simultaneously from the stellar spectra. In particular, 
the width (FWHM) and degree of asymmetry of the spectral lines 
(usually measured from the CCF bisector span or bisector inverse 
slope), as well as chromospheric activity indicators (e.g. logi^HK, 
which measures the amount of emission in the cores of the Calcium 
II H c& K lines), are all sensitive to activity, but not to the presence of 
planets. Pont et al. (2011) were among the first to carry out explicit 
joint modelling of RVs and these ancillary activity indicators, in 
the context of CoRoT-7. They fit a simple spot model to the FWHM 
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and bisector span time series and used it to predict the RV variations 
due to activity. However, spot models are always very degenerate, 
making a robust exploration of the parameter space of the model 
impractical. 

In the next section we introduce an extension of the FF’ 
method, which uses the same principles, but models the RVs jointly 
with one or more of the aforementioned activity indicators. The 
various time series are modelled as linear combinations of a single, 
underlying Gaussian process (GP) and its time derivative, plus a 
separate noise term which is specific to each time series. 

3 PROPOSED FORMALISM 

3.1 Formulation of the physical model 

The physical model we propose to use is an extension of the FF' 
framework introduced in A12, which can be used to estimate RV 
variations due to stellar activity using photometry. In this frame¬ 
work, the fraction of the visible stellar hemisphere that is covered 
in spots is represented by F{f). For the simplest case of a single, 
small, equatorial spot, F{t) represents the projected area of the 
spot, and varies as cos(27rf/P), where P is the rotation period. 
A12 then showed that the RV perturbation due to activity (even for 
a non-equatorial spot) can be written: 

AKV = VrF{t)F{t) + V,F‘^{t), (1) 

where the first term represents the rotational modulation signal, and 
the second term the suppression of convective blueshift in magne¬ 
tized regions. This remarkably simple expression arises because the 
radial velocity of the stellar surface varies as sin(27rf/P), which is 
proportional to F{f), whereas the projection of the convective up- 
welling along the line of sight varies as cos(27rf/P), which is pro¬ 
portional to Fit). Both terms are multiplied by the projected spot 
area, which is proportional to F{t). The two constants K and 14 
can be related to physical quantities such as the fractional coverage 
and contrast of spots and faculae, the stellar equatorial velocity and 
the convective blueshift velocity, but are treated as free parameters 
for the present purpose. Equation (1) was derived by considering 
a single active region, but is used to describe the combined effect 
of all the active regions on the visible hemisphere. Though this is a 
first-order approximation, A12 carried out a number of tests of this 
formalism and showed that its performance was similar to that of 
more sophisticated methods. A caveat, however, is that photometry 
is insensitive to certain spot distributions, i.e. photometric signals 
will not always contain sufficient information adequately to pre¬ 
dict RV variations. This problem is addressed in this paper, where 
we propose to use more than one activity indicator to constrain the 
activity component in RV variations. 

We thus seek similar expressions to estimate RV variations 
due to stellar activity using activity diagnostics other than photom¬ 
etry. We consider the case where we have a chromospheric activ¬ 
ity indicator and a line asymmetry measurements available; for the 
sake of concreteness, we take these to be the log i?HK index and 
the inverse slope of the bisector of the cross-correlation function, 
or simply bisector inverse slope (BIS), respectively, as is the case 
for HARPS datasets. 

As a first approximation, it seems reasonable to expect 
log /?HK and to behave as F^ (t), the convective blueshift suppres¬ 
sion term, which is close proxy for the fractional coverage of active 
regions. On the other hand, like the RVs, the bisector inverse slope 
also depends on the velocity of the stellar surface at the location 


of active regions, so the expression for the BIS should include an 
additional F{t) F{t) term: 

logi?HK = L^F^(t) (2) 

BIS = B,F{t)F{t) + B,F\t), (3) 

where Lc, Be and Be are free parameters. Additionally, each time 
series will have its own noise term, which is treated as white (this 
is discussed in more detail in later sections). 

Equations 2 and 3 relate two particular activity-sensitive ob¬ 
servables (log R'uk and BIS) to spot coverage, although analogous 
relations could be constructed for other activity-sensitive observ¬ 
ables. For example, the Bhr chromospheric activity index can be 
related to spot coverage in the same way as the logBjjK index 
(Isaacson & Fischer 2010). Similarly, FWHM data are usually very 
tightly correlated with (if noisier than) log R'uk data - especially 
if there is a tight relation between the size and location of the pho- 
tospheric spots and that of the chromospheric structures such as 
faculae - and so the same form of relation could be used. 

3.2 Gaussian process framework 

Gaussian processes provide a mathematically-tractable and very 
flexible framework for performing Bayesian inference about func¬ 
tions. They are particularly suitable for the joint modelling of deter¬ 
ministic processes with stochastic processes of unknown functional 
forms (though with some known properties); they lend themselves 
very naturally to rigorous error propagation; and they allow us to 
relax the usually quite restrictive (e.g. linear) relationships which 
have been assumed in previous attempts at the joint modelling of 
RV with ancillary times-series (Tuomi et al. 2014). 

GP regression enables us to model complex stochastic pro¬ 
cesses by parametrising the covariance between pairs of datapoints, 
rather than writing down an expression for the data themselves; de¬ 
terministic components of the model can also be incorporated as 
the mean of the GP. Mathematically, the prior joint distribution of 
the outputs y (observables, in the case of our framework) is taken 
to be a multi-variate Gaussian: 

P(y) = A/'(y;m,K). (4) 

The mean vector m is given by 

m = m(x; 6), (5) 

where m is a mean function of the inputs x (observation times, for 
our framework) with parameters 6. The elements of the covariance 
matrix K are given by 

Kij = k{xi,Xj ■,(()), ( 6 ) 

where fc is a covariance kernel function with parameters (f). The 
covariance kernel function k provides the covariance element be¬ 
tween any two sample locations or times, Xi and Xj . 

The elements of m and K are, in the strict sense of the term, 
the parameters of the model; however, the values of these param¬ 
eters are controlled by a small number of hyper-parameters 6 and 
(p, and in practice will never be inferred directly. 

Gaussian distributions obey many convenient mathematical 
identities, which allow us trivially to marginalise over unobserved 
function values - even in the common case where there are an infi¬ 
nite number of such values. This remarkable property enables us to 
write down a marginal likelihood Cm,k{0, <p) for the data, given a 
GP model: 

\og\C^^k{.6,4>)] = i log(detK) - | log (27r), 
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(V) 

where r = y — m is the vector of residuals of the data after the 
mean function has been subtracted, and n is the number of data- 
points. This process is known as GP regression, and is described 
in more detail in textbooks such as Bishop (2006) and Rasmussen 
& Williams (2006); alternatively, see Roberts et al. (2013) for an 
accessible, tutorial-style introduction to GP regression. The free 
hyper-parameters 0 and (p can then be varied to maximise £m,k', 
this process is known as Type-II maximum likelihood, or marginal 
likelihood maximisation (Gibson et al. 2012). In so doing we refine 
vague distributions over many, very different functions - the forms 
of which are controlled by 9 and (p -to more precise distributions 
which are focused on functions that best explain our observed data. 

To compare models with different mean and covariance func¬ 
tions (m and k, respectively), we must compute the model evidence 
for each model, which involves numerical integration of the likeli¬ 
hood jCm,k and priors over hyper-parameters 0 and p. In practice, 
the computational cost of doing this can be extremely expensive. 
Furthermore, formulating physically motivated, statistically proper 
priors for each hyper-parameter can be challenging. On the other 
hand, it is often both theoretically-defensible and computationally- 
convenient (see Gibson et al. 2012) first to maximise £m,k with 
respect to all hyper-parameters, next to fix the ‘uninteresting’ ones 
(typically those in p, which are often also degenerate) to their max¬ 
imum a posteriori (MAP) values, and lastly to marginalise fully 
over the remaining parameters of interest (typically those in 9). 
This approximation avoids having to perform matrix inversion at 
every step of marginalisation - in practice, this entails marginal¬ 
ising over planet parameters, for example, but simply optimising 
the parameters of covariance kernels. We adopt this approximation 
throughout this paper. 


3.3 GP model for multiple time series 

Again for the sake of concreteness, we consider the case of a 
HARPS-like dataset, where we might wish to model jointly three 
observables: RV perturbations ARV, an activity index logRjjK. 
and the bisector inverse slope BIS. Further, to illustrate the way 
deterministic components can be incorporated into the framework, 
we assume - as is the case for the a Cen B dataset, analysed in 
Section 4.4 - that we wish to model dynamical effects in the RVs 
(binary motion, possible planet, etc.). 

In this case, we have N observations of one independent vari¬ 
able, time t, and three dependent variables: ARV, logi?HK, and 
BIS. In order to treat all three as different manifestations of a single 
underlying process, we re-arrange the data into 3N observations of 
a 2 -dimensional input x = [t,l], and of the corresponding output 
y, where = ARV if ( = 1, logRjjj,; if Z = 2 and BIS if 

I = 3. 

The baseline mean function is taken to be a second order poly¬ 
nomial, which in this example would suffice to represent the motion 
of a Cen B around the centre of mass of the a Cen binary system 
(Dumusque et al. 2012), for the RV data, and a constant (DC offset) 
for the other time series: 


m(x) 

= m^‘\t), 

( 8 ) 


— a-\-bt d?, 

(9) 

rri'^\t) 

= d, 

( 10 ) 

m^^\t) 

= e, 

( 11 ) 



( 12 ) 


where 9 — [a, b, c, d, e] are free hyper-parameters. One or more 
Keplerian terms (each with an additional five hyper-parameters) 
can be added to the mean function to represent planetary signals, 
but this baseline model has no planet. 

A white-noise component (to encapsulate observational noise, 
and also activity-induced and other stellar signals not captured by 
our model, arising e.g. from pulsation) is incorporated in the co- 
variance function as follows: 


fc(xi,Xj) = -f af. 5{ti - tj) 


(13) 


where cr;. is the standard deviation of the noise associated with 
each type of observation, and 5{x) is the Kronecker delta function. 

We next introduce a new latent (unobserved) variable, G{t) = 
which represents the activity term, and is described by a 
zero-mean Gaussian process with covariance function 7 . Conve¬ 
niently, we can then re-write equations (1), (2), and (3) as 


ARV 

= KG(t)-f 14G(Z), 

(14) 

log Rhk 

= TcG(Z), 

(15) 

BIS 

= B,G{t) + B,G{t), 

(16) 


where a factor 2 has been incorporated into the constants K and 
Br. We now make use of the fact that any linear combination of a 
GP and its derivative^ is also a GP; following Osborne (2010), the 
covariance between an observation of G at time ti and an observa¬ 
tion of G at time tj is 


(G,dG) 


d 




(17) 


where is used to denote the covariance between 

(non-derivative) observations of G at times ti and tj. Similarly, the 
covariance between two observations of G at times ti and tj is 


7 


(dG,dG) 


d d 


(t- t-^ - —— t') 

(G,t,)- 7 (t,t ) 


dt' dt 


(18) 


Using the above notation, the individual covariance func¬ 
tions over the various inputs can then be related to 7 as follows 
(in the notation below, k^l^^ = cov (ARV, log R^k), = 


^ In fact, any affine operator (ineluding linear, derivative, and integral oper¬ 
ators) applied to a GP yields another GP. This enables our framework to ex¬ 
ploit the well-known analytic identities for conditioning and marginalising 
Gaussian distributions, and makes the problem computationally tractable. 



cov (BIS, ARV), etc.): 
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k(23).,, 

^act 1^*5 

k) 

= LeBef°’'^\ti,tj 


)• 


(24) 

The final covariance matrix over the new input space is formed as 
the following block matrix: 

/ 1,(11) . (12) . (13) \ 

*^act l^acl l^act \ 

Kac.= k'f) . (25) 

k^®^) 1,(32) . (33) 1 

\ *^act ■*^act *^act / 

Provided 7 is a valid covariance function, Kact is guaranteed to be a 
valid covariance matrix: in particular, it will be symmetric and pos¬ 
itive semi-definite. In practice, then, it is not necessary to compute 
the blocks k*^/\ and explicitly. 

Thus, equipped with input vector x, output vector y, and co- 
variance matrix Kact, we are in a position to use the standard ma¬ 
chinery of Gaussian process inference to model all three time series 
jointly as manifestations of a single underlying Gaussian process. 
However, it remains for us to choose the form of the covariance 
function, 7, defining this underlying stochastic process. 


3.4 Choice of latent covariance function 

We now seek the simplest form for the covariance kernel function, 
7, that satisfies our beliefs and ignorance about the activity process 
we are modelling. 


3.4.1 Squared-exponential covariance 

For many time series, we expect the informativeness of past obser¬ 
vations in explaining current data to be a function of how long ago 
the past observations were made (for example, we might expect 
two observations made minutes apart to have similar values, but 
would not necessarily expect any connection between observations 
made months apart); if this is the case, we can restrict our con¬ 
sideration to stationary covariance functions which are dependent 
solely on |f — f'|. Such covariance functions can be represented as 
the Fourier transform of a normalised probability density function 
(via Bochner’s Theorem Rasmussen & Williams 2006), which can 
be interpreted as the spectral density of the process. 

The most widely-used covariance function of this class - in¬ 
deed, probably the most widely-used covariance function in GP in¬ 
ference - is the squared-exponential or Gaussian kernel function. 
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given by 

. (26) 

The hyper-parameter p governs the output scale (gain/amplitude) 
of our function, and A controls the time scale. In our problem the 
amplitude parameter /3 is not strictly necessary since Gif) is never 
directly observed, and /3 is absorbed by the model parameters 14, 
Vt, Lc, Be, and Br, as in Equations 1-3. 

The popularity of this covariance function stems not only from 
its simplicity, but also the fact that it is infinitely differentiable, 
meaning that a GP with this covariance function will generate func¬ 
tions with no sharp discontinuities. 

The squared-exponential covariance function is usually flex¬ 
ible enough to model a wide range of smoothly-varying stochas¬ 
tic processes. However, as discussed in Section 1, we expect RV 
data and ancillary time series to show evidence for activity-induced 
variations on up to three distinct time-scales: years (magnetic ac¬ 
tivity cycle), weeks/months (rotation), and days (oscillation, gran¬ 
ulation). Thus it makes sense to generalise the covariance function 
in equation (26) to explicitly allow for multiple evolutionary time- 
scales: 

= ; (27) 

i=l L ^ . 

here, the pi’s control the relative amplitude of the N ^ 1 compo¬ 
nents, each with evolutionary time-scale Ai. 

We initially experimented with N = 3 separate additive 
terms, to describe possible variations on the short, medium, and 
long time-scales discussed in Section 2. These tests were carried 
out using both synthetic data (see Section 4.1) that by design incor¬ 
porated variations on multiple different time-scales, as well as real 
datasets including the G1 15 A and a Cen B datasets considered in 
Sections 4.3 and 4.4. However, we rapidly found that the longer 
time-scale term was invariably unnecessary: any reasonable choice 
for the medium time-scale term had to be fairly flexible, since rota¬ 
tional signals generally evolve from year to year, and so the longer 
term variations can be captured by the medium time-scale term. On 
the other hand, we found a separate short-time scale term was gen¬ 
erally required: without it, the residuals in all time series considered 
often exhibited obvious correlations. 

Appendix Al gives expressions for the covariance between 
observations of G and its derivative G, and between two observa¬ 
tions of G, for the case when G is defined by a squared-exponential 
covariance function as in equation (27). 


3.4.2 Quasi-periodic covariance 

While the squared-exponential covariance function given in the 
previous section is both flexible and an obvious starting point when 
not given much information about the stochastic process we are 
modelling, we can do better. In particular, we have a physical rea¬ 
son to expect a degree of periodicity in the activity signals, since 
they are modulated by the periodic rotation of the star. We there¬ 
fore consider the following quasi-periodic covariance function for 
the latent process Gif) - as previously considered by A12 to model 
observed variations in the Sun’s total irradiance, and by Haywood 
et al. (2014) to model correlated noise in RV residuals following an 
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application of the FF' method - formed hy multiplying® a station¬ 
ary kernel with a periodic one: 




2A2 


(28) 


where P and Ap correspond to the period and length scale of the 
periodic component of the variations, and Ae is an evolutionary 
time-scale. While Ae has units of time, Ap is dimensionless, as it 
is relative to P. Once again there is no need to introduce an ampli¬ 
tude parameter, as this will be controlled for various time series by 
the model parameters K, Vi, Lc, Be, and Sr, as in Equations 1-3. 

Examples of functions drawn from a Gaussian process with 
this quasi-periodic covariance function are given in Fig. 1. 

A word is in order about the interpretation of the hyper¬ 
parameters. While they could be related to spot persistence life¬ 
times, complexities of spot configurations, autocorrelation func¬ 
tion decay time-scales, etc., such direct interpretations are rarely 
straightforward or well-grounded theoretically, especially because 
of degeneracies between the hyper-parameters. In the case of mod¬ 
elling differential rotation and/or spot evolution, for example, the 
hyper-parameter P might become almost irrelevant, with a large 
value of Ap and a very small value of Ae allowing for complex, 
rapidly evolving waveforms without any obvious characteristic pe¬ 
riodicity. In future, an in-depth study of how the values of these 
hyper-parameters translate into physically-interpretable properties, 
for example using the large sample of Kepler light curves consid¬ 
ered by Aigrain et al. (2015) - which in turn highlights the difficulty 
of distinguishing between differential rotation and spot evolution - 
might be useful. 

In any event, the quasi-periodic covariance kernel function 
defined in equation (28) is physically-motivated (albeit not al¬ 
ways easy to interpret straightforwardly) and has only three hyper¬ 
parameters; moreover, in tests with both real and simulated data, 
we found it to produce as good or better results than when using a 
squared-exponential covariance function with a similar number of 
hyper-parameters. Therefore in all subsequent discussion the quasi- 
periodic covariance function will be assumed by default. (We do 
however discuss the squared-exponential covariance function be¬ 
cause it is conceivable that there will be cases, though not con¬ 
sidered here, where it might be a more appropriate choice than 
the quasi-periodic covariance; alternatively, it might be appropriate 
to generalise the function to include more than one quasi-periodic 
component.) 

Appendix A2 gives expressions for the covariance between 
observations of G and its derivative G, and between two observa¬ 
tions of G, for the case when G is defined by a squared-exponential 
covariance function as in equation (28). 


3.4.3 Other possible covariance functions 

There exists a wide variety of functions that can be used to specify 
the correlation between pairs of inputs, and thence to generate a 
covariance matrix over a set of observations and predicants. These 
functions can further be combined and modified in a multitude of 
ways, giving one great flexibility in how functions are modelled. 


In the preceding subsections, we considered two simple such func¬ 
tions that were deemed adequate for the cases considered in this 
paper; however, it would certainly be possible to replace these and 
use, within our same framework, other covariance functions (e.g. 
Rasmussen & Williams 2006). 

For example, as an alternative to using a sum of squared- 
exponential kernels, one might use a rational-quadratic kernel, 
given by: 

it, t') j ; (29) 

it may be shown that this is equivalent to a scale mixture of squared- 
exponential kernels with different length scales, with the latter be¬ 
ing distributed according to a Beta distribution with parameters a 
and A“®. 

On the other hand, if one has reason to expect that the latent 
process being modelled is not as smooth as functions drawn from a 
GP with squared-exponential kernel, one could consider the Matem 
class of covariance functions, defined by: 




r{u)2-' 


1 








(30) 


where /3 is an output scale, A is the input scale, r() is the standard 
Gamma function such that r(i^) = and B() is the modified 

Bessel function of second order. The hyperparameter v controls the 
degree of differentiability of the resultant functions modelled by a 
GP with Matern covariance function, such that they are only k- 
times differentiable if > k. For example, taking o — ^ leads 
to twice-differentiable functions, and as i/ —> oo, the Matem ker¬ 
nel becomes the squared-exponential one. Though a Matem covari¬ 
ance function with v < oo might be deemed more physically re¬ 
alistic than a squared-exponential one, its use does of course come 
with the penalty of some additional mathematical and computa¬ 
tional complexity. For the purposes of the present work, then, we 
did not carry out any tests with these more sophisticated covariance 
kernels; their use might, however, feature in future work."* 


3.5 Summary and appraisal of the framework 

The key aspects of the above framework are sketched schematically 
in Fig. 2. Though it may appear quite mathematically complex, es¬ 
pecially to those unfamiliar with GP regression, its underlying ideas 
are quite simple. To summarise: 

(i) we assume that the underlying stochastic process giving rise 
to activity signals in observables (RVs and ancillary time series) can 
be described by a GP, with suitably-chosen covariance function; 

(ii) we can then use physically-motivated or empirical models to 
provide analytical links between this GP and available observables; 

(iii) finally, with the addition of noise and deterministic compo¬ 
nents (e.g. dynamical effects for the RVs), all observables can be 
modelled jointly as GPs, with the ancillary time series serving to 
constrain any activity component of the RVs. 


® As described in Rasmussen & Williams (2006), a valid covariance func¬ 
tion under any arbitrary (smooth) map remains a valid covariance function. 
Valid covariance functions can also be constructed by adding or multiplying 
simpler covariance functions. 


Initial tests with the rational-quadratic kernel function led to very similar 
results as when using the squared-exponential kernel. A rigorous approach 
to choosing between models with different kernel functions would require 
computation of Bayes factors for all models considered, an extension to this 
work we defer to future consideration. 
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Figure 1. Examples of functions drawn from a Gaussian process with a quasi-periodic covariance function, as defined in Equation 28, in each panel with 
different hyper-pai'ameter values (three function draws per panel). The hyper-pai'ameter P sets a characteristic period for the function’s variations, A p controls 
the extent to which the function varies within one period, and Ae controls the time-scale over which the (quasi-)periodic component evolves (variations become 
strictly periodic as Ae —1 oo). In all cases, the function amplitudes are arbitrary, and a constant mean function (offset) is assumed. 


Our framework offers a number of advantages over existing 
approaches. Firstly, modelling all available activity-sensitive time 
series (line widths, asymmetries, chromospheric activity indices 
etc.) jointly with RVs should allow tighter constraints to be placed 
on activity signals in RVs, compared to exploiting only simple 
correlations between RVs and (typically) one of these other time 
series. Secondly, our general framework is flexible in that it uses 
Gaussian process draws (and derivatives thereof) as basis functions 
to model available time series, rather than e.g. sinusoids or other 
simple parametric models, the inappropriate use of which could 
lead to the inadvertent introduction of correlated signals into model 
residuals; in the case our GP framework, the properties of basis 
functions are informed in a more principled way by the observables. 
Thirdly, our framework facilitates smooth interpolation between 
observables, as well as extrapolation to future times (prediction). 
Lastly, being based on GPs, the entire framework can be accom¬ 
modated very naturally within the broader framework of Bayesian 
inference, allowing uncertainties to be handled in a principled way: 
overly-complex models are automatically penalised, nuisance pa¬ 
rameters can be marginalised over, and rigorous model compar¬ 
isons can be performed (Rasmussen & Williams 2006). 

A possible disadvantage of our framework is that, in its current 
form, it can only use the latent ‘activity’ GP and linear transforma¬ 
tions thereof (including derivatives) to generate the activity signals 
in observables; this is sufficient for linking e.g. RVs and photometry 
via the FF' method, but the latter is only a first-order model. Al¬ 
lowing for more general relationships between the latent Gaussian 
process and the observables might be possible, although it is not im¬ 
mediately clear that doing so would result in a computationally- or 



Figure 2. Simplified, schematic sketch of the our GP-based scheme for the 
joint modelling of an RV time series with ancillary activity diagnostics. The 
unobserved, stochastic process giving rise to activity-related RV variations 
(left panel) is assumed to be describable by a GP. The way in which this 
propagates into observable time series (middle panels) is computed analyt¬ 
ically, using a simple physical model, and so all observed time series can 
be modelled jointly. A noise model and deterministic physical component 
(right panel, e.g. Keplerian model to describe exoplanet-induced RV varia¬ 
tions) completes the basic framework. 
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mathematically-tractable framework. Another potential disadvan¬ 
tage is that ‘vanilla’ Gaussian-process regression is not well-suited 
to modelling large datasets (N ^ 1000 observations): GP regres¬ 
sion hinges on matrix inversion, with naive inversion algorithms 
scaling as . Fortunately there do exist a number of approxima¬ 
tion techniques which hold great promise for allowing efficient GP 
regression on larger datasets (Lazaro-Gredilla et al. 2010; Chalupka 
et al. 2012; Ambikasaran et al. 2014). In the case of time-series 
analysis, there exist very efficient 0{N) approaches for Gaussian 
process regression based on approximate re-interpretation as filters 
(Reece & Roberts 2010; Sarkka et al. 2013; Reece et al. 2014). 

It should be noted that this is not the first instance of GPs being 
used to model temporally-correlated stellar nuisance signals: Hou 
(2014) presented a framework for using GPs to model stochastic 
stellar oscillations in RV data, while Barclay et al. (2014) used GPs 
to model photometric granulation signals in a Kepler-91 dataset. 
The only published instances (to our knowledge) of GPs being used 
specifically to model stellar activity signals are the work of Hay¬ 
wood et al. (2014), who used a GP to mitigate the effects of active 
regions without photometric counterparts in the CoRoT-7 system, 
and the work of Grunblatt et al. (2015), who used a GP to model ac¬ 
tivity in Kepler-78 datasets. In both cases, quasi-periodic GPs were 
trained on photometric data, and a GP model with the same covari¬ 
ance properties then used to model available RV data. The frame¬ 
work we have presented may be seen as a more general and flexible 
version of the aforesaid approach, in the sense that our framework 
allows all available activity-sensitive time series to be modelled 
jointly, and does not necessarily require simultaneous RV and pho¬ 
tometry data. When both RV and photometric data are available, 
the FF' formalism is built directly into the modelling; unlike the 
FF' method, however, our framework would not require all active 
regions to have photometric signatures in order to constrain their 
RV contributions, provided at least one other activity-sensitive time 
series (e.g. BIS) were available. 


4 TESTS AND APPLICATIONS 

4.1 Simulated data using naive physical model 

As a first test of our GP framework, we used a number of different 
functions (e.g. polynomials; sinusoids; wavelets; function draws 
from GPs; etc.) to generate a range of synthetic time series to rep¬ 
resent possible latent, activity-driving processes. We included peri¬ 
odicities and correlations on many different time-scales, to test our 
models’ ability to deal simultaneously with the three different time- 
scales for variations discussed in Section 2 (as a simple example, a 
sine wave with evolving amplitude, added to a long-term parabolic 
trend). These time series were densely-sampled and noise-free. We 
then used the relations in equations (14), (15), and (16) to simulate 
ARV, log R'uk and BIS “observables”, computing derivatives nu¬ 
merically using finite-difference approximation where applicable. 
In other words, we generated the observables we would obtain for 
arbitrary activity processes, assuming the physical model we used 
to link the activity process to observables corresponded exactly to 
reality. 

Using a number of different covariance functions (including 
those discussed in Section 3.4), our general findings were the fol¬ 
lowing. 

(i) By performing unconstrained optimisation of the free param¬ 
eters in model likelihood functions Cm,k{0, aU observed time 
series could be modelled very accurately, with fits that appeared 


“exact” upon visual inspection (though in fact with small residuals 
consistent with numerical error due to finite precision arithmetic, 
matrix inversion, derivative approximations, etc.).® 

(ii) In all such cases, the fitted model’s latent process (GP) was 
also correspondingly close to the latent process used to generate 
the observables. In cases where the latent process used to generate 
the observables was itself a GP, the hyper-parameters of this latent 
process could be recovered accurately. 

(iii) Imposing constraints on the hyper-parameters of chosen 
mean and covariance functions led to the expected changes in fits. 
For example, constraining the evolutionary time-scale Ae (for the 
quasi-periodic covariance kernel) to be longer than a certain time 
prevented physically unrealistic, short-time scale “contortions” in 
the fits. Similarly, forcing a finite period parameter P (again in 
the quasi-periodic covariance kernel), with long evolutionary time- 
scale Ae, led to poor fits when the simulated observables contained 
no actual periodicities. 

(iv) When adding white Gaussian noise to the simulated observ¬ 
ables, the quality of fits remained good - as measured by a reduced 
chi-square statistic - provided that the estimated amplitudes of the 
added noise were reasonable. 

In sum, all aspects of the modelling worked as expected; we estab¬ 
lished that the analytical aspects of (if not the physical assumptions 
underlying) our framework were watertight, and that when this 
framework was implemented numerically, we could indeed jointly 
model multiple time series generated by an unobserved stochastic 
process and its derivative. 


4.2 Simulated data from SOAP 2.0 

SOAP (Spot Oscillation and Planet) 2.0 is a software package de¬ 
veloped by Dumusque et al. (2014) that simulates the effect of stel¬ 
lar spots and plages on radial velocity and photometry; it extends 
an older code, SOAP, developed by Boisse et al. (2012). The tool is 
available online at http : / /www .astro.up.pt/soap2 . 

In brief, the original SOAP code works by dividing up a simu¬ 
lated stellar disk into tens of thousands of resolution elements, with 
a CCF (cross-correlation function, representing the typical spectral 
line) for each element then being modelled by a Gaussian. Each 
CCF is Doppler-shifted according to the projected rotational veloc¬ 
ity, and weighted by a linear limb-darkening law. In this way, the 
non-spotted emission of the visible disk may be computed in pho¬ 
tometry and in RV. SOAP then calculates the positions of rotating 
surface inhomogeneities, which are defined by their latitudes, lon¬ 
gitudes, and sizes; the weighted CCF and flux for cells inside the 
inhomogeneities are added to (in the case of bright spots or plages) 
or removed from (in the case of dark spots) those of the non-spotted 
star. Finally, SOAP delivers the integrated spectral line, the flux, the 
RV, and the BIS as a function of stellar rotational phase. 

SOAP 2.0 extends SOAP by considering, additionally, the 
inhibition of convective blueshift inside active regions, the limb 
brightening effect of plages, a quadratic limb darkening law, a re¬ 
alistic active region contrast, and the resolution of the spectrograph 
used for the (simulated) observations. 


® Multi-modality did not seem to pose a significant problem with the like¬ 
lihood functions in question: a single run of a simple Nelder-Mead (“down¬ 
hill simplex”) algorithm often sufficed to locate globally-optimal solutions, 
though in a few cases the algorithm did benefit from multiple starting loca¬ 
tions. 
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Figure 3. GP model MAP fit to noise-free SOAP 2.0 data, based on a simulation of a single rotating spot at latitude (p = 45°, and radius 10% of the stellar 
radius. The dots indicate the simulated observations; the solid lines are model posterior means, and the shaded regions denote zbir posterior uncertainty. The 
posterior uncertainty in each time series is directly related to the amount of additive noise (ct; . terms) favoured in the MAP model. 


A number of stellar, spot, and spectral parameters can be pro¬ 
vided as inputs to the SOAP 2.0 code. In particular, the code allows 
up to four active regions (each specified by a size, brightness, lati¬ 
tude and longitude) to he included in the simulation. 

The SOAP 2.0 code does incorporate simplifications in its 
modelling: for example, the observed spectrum of a plage is con¬ 
sidered to be the same as that of a spot, and some of the stellar 
physics included in the modelling is informed only by Solar photo¬ 
sphere and spot spectra, which are assumed to be representative of 
all quiet photosphere regions and active regions. It also does not in¬ 
clude long-term magnetic activity cycles in its simulations. Its out¬ 
put nevertheless provides a far more realistic set of test cases than 
those considered in Section 4.1 - especially because it does not im¬ 
pose any ad hoc constraints on the functional relationship between 
photometric, RV and BIS time series. Moreover, Dumusque et al. 
(2014) showed that SOAP 2.0 manages to reproduce the activity- 
induced variation on the early-K dwarfs HD 189733 and a CenB; 


modelling the activity-induced variation on the latter star with our 
new GP framework is considered in Section 4.4. 

Given the wide range of possible outputs we could generate 
from SOAP 2.0 (and then modify in various ways), we adopted the 
following scheme in order to make a systematic study of how our 
GP modelling framework could be applied to realistic data. 

(i) Generate noise-free, densely-sampled outputs for a number 
of different spot configurations, and perform fits to these outputs 
(modelling photometric flux in place of logi?HK> t-S- using the 
FF' model; see Section 3.1). Based on these results, select a few 
qualitatively different configurations - including the configuration 
for which the poorest fits are obtained - for further study. 

(ii) Study the residuals from the fits in the selected noise-free 
cases, and examine how much additive white noise - i.e. the ai- 
terms in Equation (13) - should be allowed in the various time se¬ 
ries to ensure that activity is accurately filtered from RVs. 

(iii) Repeat the fitting as above (now allowing for additive white 
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Time [days] 

Figure 4. GP model MAP fit to noise-free SOAP 2.0 data, based on a simulation of four rotating spots (each with radius 5% of the stellar radius) at latitudes 
</> = 0°, 30°, 45°, 60°, equally spaced in longitude. The dots indicate the simulated observations; the solid lines are model posterior means, and the shaded 
regions denote ztcr posterior uncertainty. The posterior uncertainty in each time series is directly related to the amount of additive noise (tr;^ terms) favoured 
in the MAP model. 


noise in the GP model), but this time modifying the SOAP code’s 
output to include observational noise, realistic time sampling, etc. 

(iv) Repeat the fitting as above, but with the injection of a Kep- 
lerian signal into the RVs, and study whether the Keplerian signal 
can be disentangled from the activity signal. 

It should be emphasised that although this resembles a prin¬ 
cipled approach to studying our modelling framework using the 
SOAP 2.0 tool, we did not attempt to make a comprehensive study 
of (amongst other things) the inter-related effects of different lev¬ 
els of observational noise, different time samplings, different Ke¬ 
plerian amplitudes, periods or phases, etc. Apart from being be¬ 
yond the scope of this work, such an all-encompassing study would 
likely have been misguided, given that both our model and the 
SOAP 2.0 tool incorporates simplifying assumptions which are not 
easily quantified. As such, discrepancies between SOAP output and 
our GP models would not necessarily represent shortcomings of our 
model; conversely, good agreement between SOAP output and our 


model would not necessarily imply equally good agreement could 
be expected with real observational data. Rather, the purpose of 
these tests was to ascertain whether our framework would work at 
all with fairly realistic simulated data, to understand how the cr;. 
terms in our model should be constrained, and to get a conservative 
though quantifiable estimate of the possible shortcomings of our 
model. 

We found that by allowing proportionately more additive noise 
(the (T;. terms in Equation 13) in the ancillary time series than 
the RV time series - in effect, attaching less weight in the fitting 
to these time series, though still using them to constrain the la¬ 
tent process’ hyper-parameters - we were generally able to obtain 
RV residuals that resembled white Gaussian noise. In other words, 
we were only interested in modelling activity-induced variations 
in ARV accurately, and we could achieve this goal provided we 
“loosened” the assumed relationships between ARV and ancillary, 
activity-sensitive time series (see Section 3.3). When we did not 





A GP framework for modelling stellar activity 11 




Time [days] 


Figure 5. GP model MAP fit to SOAP 2.0 data, based on the same one-spot configuration as in Fig. 3, but this time with time sampling and noise levels taken 
from a real HARPS dataset. Each time series is split into four panels, to avoid plotting large gaps where no data are present. Residuals in the ARV time series 
are plotted below the simulated data and fitted model, but for the sake of clarity, with an arbitrary vertical offset from the main time series. 


allow for any white-noise in any of the time series, or set equal up¬ 
per bounds on the white noise in all time series (effectively giving 
equal weight to all time series), we generally ended up with better 
fits to the ancillary time series, poorer fits for ARV, and correlated 
residuals in all time series. 

In particular, we deduced the following approximate upper 
bounds to allow on the aq terms, if we wanted to ensure white 
Gaussian RV residuals: 

• (tarv: up to 5% of rms(ARV); 

• (Tflux: up to 10% of rms(flux); 

• (TBis: up to 20% of rms(BIS). 

Constraining the additive white noise in terms of the root mean 
square (rms) variability of a given time series reflects the fact that 
there are certain features in these time series that our model cannot 
capture accurately (e.g. an inflection point in the flux time series, 
when none exists in ARV), regardless of the details of the scaling. 


offsets, etc. of the time series in question. These constraints are sim¬ 
ply upper bounds on cr;., and serve to quantify the extent to which 
our framework fails to model all three time series simultaneously; 
in practice, the values of aq will usually be smaller than these up¬ 
per bounds, and will be determined by optimisation or marginali¬ 
sation, as for any of the other free parameters in our framework. 

To illustrate our findings, we consider the two representative 
configurations of a single spot (spot radius 10% of the stellar ra¬ 
dius) at latitude 4> = 45°, and of four equally-sized spots (each 
with radius 5% of the stellar radius) at 0 = 0°,30°,45°,60°, 
equally spaced in longitude. These configurations are representa¬ 
tive of cases that were particularly easy and difficult, respectively, 
to model with our GP framework. In both cases, the observed star in 
the simulation had Solar parameters (5778 K effective temperature, 
5115 K spot temperature, 25.05 d rotation period. Solar limb dark¬ 
ening coefficients, etc., as per Dumusque et al. 2014), and the sim¬ 
ulated instrument resolution (115,000) corresponded to a HARPS- 
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Figure 6. GP model MAP fit to SOAP 2.0 data, based on the same four-spot configuration as in Fig. 4, but this time with time sampling and noise levels taken 
from a real HARPS dataset. Each time series is split into four panels, to avoid plotting large gaps where no data are present. Residuals in the ARV time series 
are plotted below the simulated data and fitted model, but for the sake of clarity, with an arbitrary vertical offset from the main time series. 


like instrument. The observations we simulated were noise-free, 
and spanned three rotation periods (approximately 75 days), with 
uniformly randomised temporal sampling. 

We obtained MAP fits of our GP model to the simulated 
data; we used a quasi-periodic covariance kernel, and placed non- 
informative priors on all free model parameters. Uniform priors 
were used for parameters with known scales (e.g. P, expected to 
correspond to a stellar rotation period; or K and Vc, which set the 
amplitude for RV variations), while scale-invariant Jeffreys priors 
were used for all other parameters (e.g. Ap, Ae). Typical computa¬ 
tion time (on a modern laptop) was on the order of a few minutes 
to obtain a MAP solution, using a single run of a Nelder-Mead 
(downhill-simplex) algorithm. We established (probable) global 
optimality by making sure downhill-simplex runs from different 
starting points converged to the same optimum. 

Fig. 3 shows results for the ‘easy’ (one-spot) configuration, 
while Fig. 4 shows results for the ‘difficult’ (four-spot) configura¬ 


tion. In both cases, the ARV time series could be modelled ex¬ 
tremely accurately (rms of residuals on the order of 0.01 m in 
both cases). The ancillary time series were, necessarily, modelled 
less accurately - especially in the case of the four-spot configu¬ 
ration, where the equatorial spot contributed the most problematic 
features to the ancillary time series. The magnitude of the cr;. terms 
(additive white noise) used in the modelling is indicated by the size 
of the ±(T posterior uncertainties plotted in these figures. 

We next considered using more realistic time sampling 
and adding noise to the simulated observations. To do so we 
used temporal sampling taken directly from the publicly-available 
aCenB dataset from Dumusque et al. (2012), which featured 
nearly four years’ worth of data, with gaps of several months be¬ 
tween observing seasons. We also added white Gaussian noise to 
the SOAP 2.0 output: for each time series, the noise was scaled so 
to be the same fraction of the rms variation of the time series itself 
as was the corresponding a Cen B noise estimate. 
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Figure 7. GP model MAP fit to SOAP 2.0 data, based on the same four-spot configuration as in Fig. 4, with time sampling and noise levels taken from one 
season of a real HARPS dataset, plus an injected Keplerian signal. Here the injected signal has an amplitude comparable to, but period different from, the 
rotationally-modulated activity signals. The top panel shows the model fit to the ARV time series, including residuals (fits to ancillary time series not shown 
here); the middle panel shows the Keplerian component of the fit (open circles represent injected signal before noise was added); and the bottom panel shows 
the normalised Lomb-Scargle periodogram of the ARV residuals, along with false alarm probability thresholds. The dotted grey peak in the periodogram 
around 10.0 d indicates the significant excess power that remains in the ARV residuals when performing a fit without a planet (the rest of the power spectrum 
remains qualitatively similar, and for clarity is not included in full). 


Again we found that we could model ARV very accu¬ 
rately: for both spot configurations, the rms of ARV residuals was 
marginally smaller than the rms of the observational noise, indicat¬ 
ing that our fits were limited only by our (simulated) noise floor. 
Figs. 5 and 6 show results for the ‘easy’ and ‘difficult’ spot con¬ 
figurations, as before. Despite the gaps in the observational cover¬ 
age, the fitted models accurately recovered the structure of the pe¬ 
riodic variations in all time series (with the exception of a few out¬ 
lier points, reflecting our model’s inability to reproduce very sharp 
changes in one time series when another changes less sharply, since 
all time series are modelled with combinations of a single function 
and its first derivative). It is worth emphasising that when the data 


to be modelled include observational noise, the terms in our 
model serve to account for both this noise and any features in the 
ancillary time series that our model cannot accurately reproduce 
(we could artificially split the white-noise terms in the covariance 
matrix into ‘observational’ and ‘model imperfection’ components, 
although this would not affect in any way the practical implemen¬ 
tation of our model). 

Finally, having ascertained that our framework allowed 
activity-induced RV variations to be modelled accurately in iso¬ 
lation, we turned our attention to planet injection tests. We wanted 
to answer the question: Could we use our framework accurately 
to disentangle activity-induced RV variations from other injected 
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Figure 8. GP model MAP fit to SOAP 2.0 data, based on the same four-spot configuration as in Fig. 4, with time sampling and noise levels taken from 
one season of a real HARPS dataset, plus an injected Keplerian signal. Here the injected signal has an amplitude comparable to, and period identical to, the 
rotationally-modulated activity signals. The top panel shows the model fit to the ARV time series, including residuals (fits to ancillary time series not shown 
here); the middle panel shows the Keplerian component of the fit (open circles represent injected signal before noise was added); and the bottom panel shows 
the normalised Lomb-Scargle periodogram of the ARV residuals, along with false alarm probability thresholds. 


(non-activity) signals, e.g. planetary signals? Our general finding 
was that, as expected, modelling the ancillary, activity-sensitive 
time series in conjunction with ARV did indeed allow the activ¬ 
ity component of ARV to be very well constrained, but - crucially 
- without the non-activity RV signals being subsumed by the same 
GP model. Broadly speaking, we found that it became more and 
more difficult to disentangle injected Keplerian signals from activ¬ 
ity signals as the amplitude of the injected signals grew smaller 
relative to the activity signals (and, of course, relative to observa¬ 
tional noise). Having a Keplerian signal with a period very similar 
to that of the activity signal generally did not make disentangling 
activity and injected signal any more difficult, provided the ampli¬ 
tude of the Keplerian signal was not also much smaller than the 
activity signal. 


To illustrate these findings, we present as examples 
three qualitatively-different Keplerian signals injected into the 
ARV time series arising from the ‘difficult’ four-spot configura¬ 
tion considered earlier. The activity-induced ARV variations aris¬ 
ing from this configuration had a semi-amplitude of 1.65 m 
and a period of 25.05 days. We simulated observational noise using 
the a Cen B template as before, and we used time sampling corre¬ 
sponding to only one season of the a Cen B data, spanning approx¬ 
imately half a year (such that the simulated data sampled about 7 
stellar rotations). The injected Keplerian signals we consider have 
the following properties (where K is the RV semi-amplitude, and 
Pk is the period of the variations; we use Pk to distinguish this 
period from the period hyper-parameter P in our quasi-periodic 
covariance kernel): 
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Figure 9. GP model MAP fit to SOAP 2.0 data, based on the same four-spot configuration as in Fig. 4, with time sampling and noise levels taken from one 
season of a real HARPS dataset, plus an injected Keplerian signal. Here the injected signal has an amplitude five times smaller than, and period identical to, the 
rotationally-modulated activity signals. The top panel shows the model fit to the ARV time series, including residuals (fits to ancillary time series not shown 
here); the middle panel shows the Keplerian component of the fit (open circles represent injected signal before noise was added); and the bottom panel shows 
the normalised Lomb-Scargle periodogram of the ARV residuals, along with false alarm probability thresholds. 


(i) K = lAm s“\ Pk = 10.0 d; 

(ii) K =lAm s"\ Pk = 25.05 d; 

(iii) K = 0.28 m s“\ Pk = 25.05 d. 

Other orbital parameters were identical in all three cases; in partic¬ 
ular, eccentricity was fixed at e = 0.1. Case (i) corresponds to an 
orbiting planet which induces an RV signal with amplitude com¬ 
parable to the activity signal, and with a period that is not close to 
the stellar rotation period or any of its harmonics. Case (ii) adds 
the complication of the planet having a period very similar - iden¬ 
tical, in fact - to the period of the rotational activity signal, while 
case (iii) goes one step further by representing a planetary signal 
with periodic identical to, but amplitude fives times smaller than, 
the activity signal. 


Fits to data including these injected signals are presented 
in Figs. 7, 8, 9 (the Keplerian component was included in our 
GP model through its mean function; non-informative priors were 
placed on all Keplerian orbital parameters). MAP parameter esti¬ 
mates for selected Keplerian orbital elements in the planet injection 
tests, along with ±ct posterior uncertainties, are presented in Ta¬ 
ble 1. All parameter inference was performed using the MultiNest 
nested-sampling algorithm (Feroz & Hobson 2008; Feroz et al. 
2009, 2013), with the GP hyper-parameters first fixed at their MAP 
values, as per the computational approximation motivated in Gib¬ 
son et al. (2012). Typical computation time to obtain posterior dis¬ 
tributions for physical parameters, using MultiNest on a modem 
laptop, and using MultiNest’s default convergence criteria, was on 
the order of tens of minutes. Posteriors were generally unimodal 
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(although in general, of course, this would depend on the model in 
question, the data sampling, etc.) 

In cases (i) and (ii), the fitted Keplerian amplitudes and peri¬ 
ods agree with the true values to within a small fraction of a percent. 
In case (iii), the absolute errors in the fitted values are larger (the 
recovered amplitude, for example, is clearly a little smaller than 
the true amplitude), but of course one could never expect perfect 
fits given an imperfect model, noise, and discrete time sampling. 
Nevertheless, it is reassuring that the true values do still lie within 
the ±2 (t credible intervals around the MAP values. 

In all cases, the rms of the ARV residuals was on the order of 
0.2 m s“^. The Lomb-Scargle periodogram® of the ARV residuals 
Figs. 7, 8, 9 all show significant peaks at 6.5 d: we interpret this as 
the fourth harmonic of the activity signal, as this feature remained 
present in the ARV residuals even when the injected Keplerian sig¬ 
nal had a period that is not similar to the stellar rotation or any of 
its harmonics, e.g. as in case (i). The only other significant peaks in 
the periodogram of the residuals, clustered around 1 d, arise from 
the window function (time sampling) in the a Cen B dataset. 

In cases (i), when we did not include a Keplerian component in 
the modelling, the power spectrum of the ARV residuals contained 
significant excess power at 10.0 d, i.e. the planet’s period - a telltale 
sign that the periodic ARV variations could not be explained by ro¬ 
tational activity alone. When not including a Keplerian component 
in cases (ii) and (iii), visual inspection of the periodogram of resid¬ 
uals did not by itself point clearly to the presence of a planet with 
period 25.05 d (the same as the stellar rotation period); however, 
the reality of the Keplerian components could be inferred by the 
significant improvement in the likelihoods of the models (approx¬ 
imated e.g. using Bayesian Information Criteria; see Section 4.3) 
that included a planet. 

Thus we have demonstrated that our GP framework is able to 
disentangle activity signals from Keplerian signals, even when the 
Keplerian signal has a period identical to that of the activity signal, 
and an amplitude much smaller than the activity signal (close to the 
noise floor, in fact). Although the example we presented featured 
imperfect removal of the activity signal from the ARV data (since 
the residuals contained a signature of one of the activity signal’s 
harmonics), the ancillary time series served the role of constrain¬ 
ing very tightly the activity component in the ARV time series, 
allowing the Keplerian signal to be modelled accurately. Presum¬ 
ably, a more physically-realistic incarnation of our GP framework 
(see Section 5) will allow activity signals to be even more tightly 
constrained by the ancillary time series. We also note that the har¬ 
monic of the activity signal was not present in the ARV residuals 
when modelling the signal arising from the same configuration but 
with more data, as in Fig. 6, where nearly four years’ worth of data 
were included in the modelling. Nevertheless, the presence of a cor¬ 
related, periodic artifact of the activity signal in the RV residuals 
in this test underscores the importance of accurate activity mod¬ 
elling, and the dangers of hastening to a planetary interpretation of 
signals in residuals. It would be worth checking, in future work, 
whether a Bayesian model comparison test would clearly penalise 
and disfavour a model that included a second Keplerian component 
to explain this signal. 

We conclude the discussion of our SOAP 2.0 tests by noting 


® We computed normalised Lomb-Scargle periodograms based on the im¬ 
plementation described by Press (2007); false alarm probabilities were esti¬ 
mated by randomly permuting the original data, keeping observation times 
fixed. 


the following shortcomings of these tests, along with possibilities 
for future extensions. 

(i) Though the covariance kernel we used can naturally accom¬ 
modate quasi-periodic signals (see Fig. 1), our simulations did not 
include any spot evolution; the SOAP 2.0 code does not directly 
facilitate such evolution. In principle, though, activity signals aris¬ 
ing from different spot configurations could be combined, perhaps 
with large gaps in between the different signals, to simulate evolv¬ 
ing active regions. Long-term magnetic cycles could be simulated 
in a similar fashion. 

(ii) Non-informative (uniform and Jeffreys) priors were placed 
on all model parameters/hyper-parameters. Improved fitting could 
be expected if we were to use priors informed by a better under¬ 
standing of e.g. the hyper-parameters of our quasi-periodic covari¬ 
ance kernel, or more physically-motivated priors for Keplerian sig¬ 
nals. 

(iii) We did not include any plages in the active regions simu¬ 
lated by SOAP 2.0. At the time of writing, the SOAP 2.0 code’s 
output appeared to include a non-trivial amount of high-frequency 
noise when including plages in the simulations. Preliminary tests 
with a smoothing filter applied to the SOAP 2.0 output suggest 
that even when plages are included, the activity components in 
ARV time series can still be modelled accurately, although the 
large residuals in the ancillary time series in these tests underscore 
the simplicity of the physical model used in our framework. Com¬ 
plications arising from the use of smoothing filters notwithstanding, 
these tests suggest the need for a better understanding of the short¬ 
comings of the physical model (ultimately to be replaced, hope¬ 
fully, by a more sophisticated one) currently incorporated into our 
GP framework. 


4.3 The planet around G115 A 

Howard et al. (2014) - hereafter H14 - reported the discovery 
of low-mass planet orbiting G1 15 A, part of the G1 15 binary 
system, based on RVs from the Eta-Earth survey using HIRES. 
They characterise G1 15 Ah as a planet with minimum mass 
Msini = 5.35 ± O.75M0, based on an RV semi-amplitude 
of K = 2.94 ± 0.28 m s“^. The planet has an orbital period 
Pk = 11.4433 ± 0.0016 d, and an orbit that is consistent with 
circular. The detection and characterisation of G1 15 Ah was based 
on data with a 15-year baseline (1997 January through 2011 De¬ 
cember). 

G1 15 A itself is a modestly-active, M2 V dwarf. Over their 
15-year baseline, H14 detected a 9 ± 2.5 year activity cycle with a 
semi-amplitude of ~ 0.05, in the dimensionless units of Shk (de¬ 
rived from log Rjjk)! which represents a ~ 10% fractional change; 
this variation may be a magnetic cycle analogous to the solar cy¬ 
cle. On shorter time-scales, they detected a strong periodicity in 
S'hk measurements with period near 44 d, which they interpreted 
as rotationally-modulated activity. The fact that they also detected 
these ~ 44 d modulations in optical photometry and RV time se¬ 
ries supported this interpretation. On the other hand, the 11.44 d 
RV signal was not detected in photometry or Shk, strengthening 
the planetary interpretation of the 11.44 d signal. 

Since the 11.44 d period of G1 15 Ah could easily be mistaken 
for the fourth harmonic of the ~ 44 d stellar rotation period, H14’s 
publicly-available dataset provides an interesting test case for our 
GP model: an obvious (though perhaps unfounded) concern in this 
case would be that the GP activity model would simply absorb the 
planetary signal. 
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Table 1. MAP parameter estimates for selected Keplerian orbital elements in the planet injection tests, along with ztcr posterior uncertainties. Parameter 
inference was performed using the MultiNest nested-sampling algorithm, with non-informative priors placed on all GP mean function (Keplerian) parameters. 



A RV semi-amplitude K [m/s] 

Period Pk [days] 

Eccentricity e 



Model 

Truth 

Model 

Truth 

Model 

Truth 

Case (i) 

1.400 ±0.045 

1.40 

9.9941 ± 0.0086 

10.0 

0.118 ± 0.016 

0.10 

Case (ii) 

1.372 ±0.070 

1.40 

25.09 ±0.26 

25.05 

0.1050 ± 0.0081 

0.10 

Case (iii) 

0.334 ± 0.047 

0.28 

25.10 ±0.89 

25.05 

0.1051 ± 0.0080 

0.10 


Accordingly, we modelled the 15 years of data published by 
H14, comprising 117 RV and Shk measurements, 59 of which 
were taken in the 2011 season (BID >2,455,500). As for our ear¬ 
lier tests, we used a quasi-periodic covariance kernel for the la¬ 
tent process driving activity; we also included a linear RV trend 
in both our single-planet and planet-free models, to account for the 
~ 2 m yr“^ acceleration of G1 15 A by its companion G115 B. 

When we did not include a Keplerian component in the mean 
function for the ARV time series, the Lomb-Scargle periodogram 
of the residuals contained multiple significant peaks, including one 
at around 11.4 days, indicating that the putative planetary signal 
was not absorbed by the activity model: since this periodic mod¬ 
ulation was not present in the S'hk time series, our model did not 
allow it in the ARV time series. On the other hand, when includ¬ 
ing a Keplerian component with circular orbit in the ARV mean 
function, the quality of the fit improved significantly, and a Lomb- 
Scargle periodogram of the residuals indicated that no significant 
periodicities remained. Using Bayesian Information Criteria (BIG) 
as a first-order approximation to Bayes factors (Berger & Peric- 
chi 1996; Raftery 1999), our single planet model was strongly pre¬ 
ferred with ABIC = 27; H14 arrived at the same conclusion, with 
ABIC = 24. 

Our GP model, one-planet fit to H14’s data is presented in 
Fig. 10, while detailed diagnostics of the ARV residuals (rms: 
0.69 m s“^) are presented in Fig. 11. Our MAP estimate for the 
ARV semi-amplitude of the planetary signal was K — 2.22 ± 
0.36 m s“^, and our estimate for its period Pk = 11.4431 ± 
0.0010 d. H14 obtained K = 2.94 ± 0.28 m and Pk = 
11.4433 ± 0.0016 d; thus our approach to modelling H14’s dataset 
led us to detect a planet with period and amplitude consistent with 
those derived by H14, to within our respective 95% (2a) credi¬ 
ble intervals for these parameters. The MAP estimate of the pe¬ 
riod hyper-parameter in our quasi-periodic covariance kernel was 
P = 43.8 ± 0.9 d, which is consistent with the ~ 44 d stellar 
rotation period derived by H14. 

We did not consider any multiple-planet models. We defer 
such analyses of this and other datasets to future work, in which 
we aim to leverage a better understanding of appropriate priors 
over our GP model’s hyper-parameters to perform more rigorous 
Bayesian model comparison. 

4.4 The a Cen B dataset 

The announcement by Dumusque et al. (2012) of the detection of 
a planet around the modestly-active, K1 V star a Cen B caused 
a major stir in the exoplanet community: if verified, the claimed 
planet aCenBb would be the closest (distance: 1.34 pc) exo¬ 
planet to Earth ever discovered, and the lowest-minimum-mass 
planet detected around a Solar-type star. In particular, the claimed 
planet has an orbital period of 3.24 d, and a minimum mass of 
1.13 ± 0.09 M®. 

Dumusque et al. (hereafter D12) obtained 459 HARPS RV 
datapoints, along with ancillary FWHM, BIS and logRnK litne 


series, over a period of 4 years. The DI2 RVs are dominated by a 
long-term linear trend, which is due to the orbit of a Cen B around 
the centre of mass of the a Cen binary system. Once this trend is 
subtracted, a gradual rise and fall over the 4-year span of the obser¬ 
vations is evident, as well as variability on shorter time-scales. D12 
interpreted the gradual rise and fall as a signature of the star’s activ¬ 
ity cycle, and the quasi-periodic variations on time-scales of a few 
weeks as caused by the rotational modulation of star spots. Lastly, 
there are short-term variations evident in all time series. These vari¬ 
ations are correlated in time, with a time-scale of a few days, as well 
as with each other (across all three time series), albeit in a some¬ 
what complex way. It is unclear whether these variations might be 
due to activity, observational noise, or a combination of both. Of 
course, in the case of the RV data in isolation, a part of this vari¬ 
ability could also be explained by a planetary companion with a 
relatively short orbital period. 

In any event, DI2 used a variety of mathematical transfor¬ 
mations to try to filter out many sources of RV variance - in¬ 
cluding starspots, photospheric granulation, and binary motion due 
to the presence of companion star a Cen A - to isolate the puta¬ 
tive planet’s RV signal. The amplitude of the binary-motion signal 
they removed was on the order of hundreds of m s“^; the com¬ 
bined amplitude of the nuisance signals ascribed to stellar activ¬ 
ity they removed was on the order of 10 m s“^; and the final, 
claimed planetary signal they isolated had a semi-amplitude of 
about ~ 0.5 m s“^. For comparison, the long-term precision of 
HARPS is 0.8 m s^^Mayor et al. 2003). 

However, the detection was a contentious one, with a num¬ 
ber of authors calling into question the planet’s existence (see e.g. 
Hatzes 2012, 2013). Some of the possible reservations about the 
approach used by Dumusque et al. include the following. 

(i) Short-term stellar rotational activity was dealt with by fitting 
sinusoidal waves at the rotational period of the star, and a variable 
number of harmonics. The constraint that all activity-induced vari¬ 
ations should be strictly sinusoidal is very hard to motivate phys¬ 
ically (Lanza et al. 2001; Brinkworth et al. 2005), and subtracting 
multiple sinusoids from the ARV time series opens the possibility 
of inadvertently introducing periodic signals (or harmonics thereof) 
that weren’t present in the original data. 

(ii) Available activity-sensitive time series (BIS and log Rhk) 
were not jointly modelled with the ARV time series - even though 
it is in the log R'uk hnte series that the rotational signal was most 
prominent. Instead, a low-pass smoothing filter was applied to the 
logJ^HK time series; it was then simply noted that the smoothed 
time series looked similar to the ARV time series, and the shape 
of this smoothed time series was then used to fit the radial velocity 
variations in order to mitigate the effects of long-term activity. 

(iii) Each of the four observing seasons was fitted independently 
of the others, thus ignoring any information about possible long¬ 
term trends/correlations in the time series, and resulting in a model 
with almost four times as many degrees of freedom as might’ve 
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Figure 10. GP model MAP fit to the publicly-available G115 A data from H14. The fit was performed to the full 15 years of data, comprising 117 observations; 
since most of the observations were made during the 2011 season (BID >2,455,500), however, only data from this season are plotted here. The dots indicate 
H14’s data (with estimated errors, where applicable); the solid lines are model posterior means, and the shaded regions denote zto- posterior uncertainty. 


been the case if fitting were performed for all seasons simultane¬ 
ously. 

(iv) The model used to fit the ARV time series alone, without a 
planet, contained 23 free parameters, and the combined fit for ac¬ 
tivity and a planet contained at least 26 free parameters. The ques¬ 
tion of possible over-fitting was not addressed in either case. For 
comparison, our baseline/planet-free GP model - discussed below 
- models multiple time series simultaneously, yet contains signifi¬ 
cantly fewer free parameters than D12’s model.^ 

(v) No rigorous model comparison was performed to quantify 
the evidence for the existence of the planet, i.e. the extent to which 
a planetary model is (or isn’t) preferred. 

Given that our GP framework for modelling activity would 
allows us, in principle, to avoid all of these limitations, we decided 
to apply it to the publicly-available D12 dataset. 

As in all previous tests, we used a quasi-periodic covariance 
kernel for our latent process. We modelled all 459 datapoints in 
each of the ARV, logi^HK^nd BIS time series jointly using this 
single latent process and its derivative.® The ARV time series was 
corrected for binary motion, with a residual DC offset removed 

^ Of course, a more meaningful application of Occam’s razor would re¬ 
quire marginalisation over the full volumes of respective model parameter 
spaces. 

® The FWHM data are noisier than, but very tightly correlated with, 
log 7?j;, and thus are unlikely to contain useful extra information. Both 
can be seen as proxies for the integrated spot coverage of the visible hemi- 


for each season (in general we obtained identical results whether 
we removed long-term drifts prior to the GP modelling, or in¬ 
cluded extra terms in the GP mean functions to take these into ac¬ 
count; as a computational convenience to reduce the dimensional¬ 
ity of our hyperparameter space, then, we opted for the former ap¬ 
proach). Error estimates were published by D12 for the ARV and 
log Rjjk tints series; these estimates were included in our mod¬ 
elling. Though not directly provided, BIS errors were estimated 
(following D12’s prescription) using provided photon-noise esti¬ 
mates. Finally, as before, non-informative priors were placed on all 
model (hyper)parameters. 

Our baseline GP model did not incorporate any Keplerian 
component in the mean function for the RVs. In total, the combined 
model for all three time series contained 14 free (hyper)parameters 
- in particular, it should be noted that all three time series across 
all four observing seasons were fitted using this same model. This 
stands in contrast to the approach used by D12, where the activity 
signal in each of the four seasons was fitted independently, since 
the model they used required that stellar features did not evolve 
significantly during the time considered, whereas spots and plages 
and spots are known to evolve on a time-scale of a few rotations. 
The MAP fit using this model is presented in Fig. 12, while detailed 
diagnostics of the ARV residuals are presented in Fig. 13. 

The ARV residuals appear to be normally-distributed, with 


sphere. On the other hand, the BIS also depends on the velocity of the 
stellar surface at the location of the spots (or active regions). 
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Figure 11. Diagnostics of the ARV residuals for G1 15 A, after subtracting our MAP GP model (stellar activity, plus binary orbit and one planet). The top left 
panel contains the residuals plotted as a function of time, spanning some 15 years; the top right panel contains a histogram of the residuals (rms: 0.69 m s“^); 
and the bottom panel shows the normalised Lomb-Scargle periodogram of the residuals, along with false alarm probability thresholds. 


rms 0.70 m s“^. The only significant features in the power spec¬ 
trum of the ARV residuals, presented in Fig. 13, is a complex of 
peaks tightly clustered around 1.00d(< 0.1% false alarm proba¬ 
bility), and a broad complex of peaks clustered near the ~ 37.8 d 
stellar rotation period (> 1% false alarm probability). The lack of 
other peaks in the residuals suggests that the ARV variations can 
be explained as arising from activity alone. 


The presence of residual power near the stellar rotation period 
indicates that our framework does not do a perfect job of describ¬ 
ing the rotationally-modulated activity signals - at least, not when 
fitting all four seasons with a single model. This might be indica¬ 
tive of either differential rotation, as suggested by D12, or of spot 
pattern evolution; a more detailed investigation of this possibility is 
deferred to a future study. 


As noted in Section 4.2, the peaks around 1.00 d arise from 
the window function in the a Cen B dataset. In particular, there are 
peaks at /i = 0.97 d~^ and /2 = 1.03 d“^; these peaks appear 
in the power spectra of both the model and the observations, and 
may be interpreted as the first-order daily aliases of the 1/37.8 d~^ 
rotational frequency. A third peak at /a = 1.00 d~^ appears only 
in the ARV residuals, but not in the model or the observations: this 
may be interpreted as a beat frequency arising during the computa¬ 


tion of residuals, on account of interference between the other two 
closely-spaced frequencies, /i and / 2 .® 

When we did include a Keplerian component in the mean 
function for the RVs, we were unable to extract any significant 
(non-activity) signal. Even when forcing the Keplerian signal to 
have a period of close to 3.24 d, the MAP solution favoured a 
Keplerian signal with an arbitrarily small (within the bounds of 
a Jeffreys prior) amplitude. Conversely, when forcing the Keple¬ 
rian signal to have an amplitude of about ~ 0.5 m s“^, the MAP 
solution favoured either a very short (Pk ^ 1 d) or very long 
(Pk 3> 1 year period), so that this signal essentially became a 
DC offset to the ARV time series. Furthermore, our non-detection 
of the planetary signal was also robust against different choices 
of covariance kernel functions (e.g. squared exponential, rational 
quadratic). Nor did allowing for somewhat tighter or looser links 
between the RVs and the activity-sensitive logRjjj^ and BIS time 
series, by modifying the priors on the three cr;. terms in our model, 
allow us to detect a 3.24 d Keplerian signal in the data. 

In summary, using the activity-sensitive logRnK and BIS 
time series to constrain the activity component of the RV time se¬ 
ries suggests that all of the significant variation in the RV time se- 


® To understand the beat phenomenon, it is useful to re¬ 
call the trigonometric identity: sin(27r/if) ± sin(27r/2f) = 

2 sin (2n 
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ries can be explained as being induced by stellar activity, without 
requiring any additional Keplerian component. 

We do not claim that this finding demonstrates that the claimed 
planet a Cen Bb definitely does not exist; however, the difficulty 
we have in recovering the signal as something unrelated to activity 
underscores just how important it is to model activity carefully and 
robustly, if we are to detect and characterise planets at the sub- 
m -level. 

In the near future, we aim to make a more rigorous and com¬ 
prehensive study of D12’s a CenB dataset. We would like to in¬ 
vestigate in which situations our planet-free models lead to a signal 
with a 3.24 d period in the RV residuals. This should happen at least 
in some cases, since our GP framework should be able to reproduce 
the model of Dumusque et al. as a special case - even if only with 
a non-optimal model (see Fig. 14), and/or when loosening signifi¬ 
cantly the links between the ancillary, activity-sensitive time series 
and the RVs. Concurrently, we would like to undertake detailed 
studies of planetary detection limits using both our GP framework 
and a model akin to that used by Dumusque et al., for synthetic 
data similar to the a Cen B dataset. To draw definitive conclusions 
about whether the data really suggest the presence of a planetary 
signal, we would like to perform rigorous Bayesian model compar¬ 
ison, i.e. of planet vs. no-planet models. Finally, planetary consid¬ 
erations aside, we would like to investigate whether the inability of 
our model to remove all of the rotationally-modulated activity in 
the ARV time series might be indicative of either differential ro¬ 
tation, or spot pattern evolution; one approach might be to use our 
GP framework to model each of the four seasons of data indepen¬ 
dently, and to study the changes (if any) in characteristic periodicity 
for each season. These analyses will form the focus of a separate 
paper. 

In the longer term, more measurements with denser time sam¬ 
pling - coupled with a better understanding of the noise characteris¬ 
tics of the RV data - could also lead to more definitive conclusions 
about the existence of a Cen Bb. 


5 DISCUSSION AND CONCLUSIONS 

Stellar activity can induce RV variations that can drown out or even 
mimic planetary signals, and it is notoriously difficult to model and 
thus mitigate the effects of these activity-induced nuisance signals. 
This is expected to be a major obstacle to using next-generation 
spectrographs to detect lower and lower mass planets, planets with 
longer periods, and planets around more active stars. We have pre¬ 
sented here a new framework for modelling RV time series jointly 
with one or more ancillary, activity-sensitive time series (including 
photometry, line widths, chromospheric activity indices, line asym¬ 
metries, etc.), with a view to better constraining activity signals in 
RVs, and thus better detecting and characterising possible planets. 

Our framework treats the underlying stochastic process giving 
rise to activity signals in all available observables (RVs and ancil¬ 
lary time series) as being described by a GP, with suitably-chosen 
covariance function. We then use physically-motivated and empir¬ 
ical models to link this GP to the observables; with the addition of 
noise and deterministic components (e.g. dynamical effects for the 
RVs), all observables can be modelled jointly as GPs, with the an¬ 
cillary time series thus serving to constrain the activity component 
of the RVs. 

We demonstrated the performance of our framework using 
both synthetic (Sections 4.1 and 4.2) and real (Sections 4.3 and 4.4) 
data. We started by noting that our framework can model all avail¬ 


able time series jointly, and exactly, in the simplest case where the 
physical/empirical relationships between all time series holds ex¬ 
actly. Next, using more realistic data simulated using the SOAP 2.0 
tool (including noise and realistic time sampling), we showed that 
our framework does a good job of constraining activity signals in 
RV data, provided we allow for additive white-noise terms in each 
time series we model; these additive noise terms may be interpreted 
as accounting for the extent to which our assumed relationships be¬ 
tween the time series does not hold exactly, observational noise 
notwithstanding. We also showed that the framework can be used 
to disentangle activity and planetary signals - even when the plan¬ 
etary signal is weaker than and has a period identical to the activ¬ 
ity signal - thus serving its intended purpose. Moving to real data, 
we applied our framework to the G1 15 A system; we were able 
to disentangle activity and planetary components in a HIRES RV 
dataset, and obtained a fitted planetary model which was consistent 
with one published by Howard et al. (2014). Finally, we turned our 
attention to the much-discussed a Cen B dataset: we showed that, 
contrary to the analysis of Dumusque et al. (2012), we were able 
to attribute the observed radial velocity variations to stellar activity, 
without requiring a planet. 

Although our framework hinges, at least partly, upon a num¬ 
ber of approximations and empirical relationships, its performance 
appears promising, and it offers a number of advantages over ex¬ 
isting approaches to mitigating activity in RV datasets. It is flexi¬ 
ble; though algebraically non-trivial, it is conceptually simple, and 
makes minimal assumptions about the properties of the underly¬ 
ing processes inducing activity signals in observables; it facilitates 
smooth interpolation between observables, as well as extrapola¬ 
tion to future times (prediction); and lastly, the entire framework 
is accommodated very naturally within the broader framework of 
Bayesian inference. As such, we hope that our new framework will 
form a useful addition to the toolbox of those confronted with mit¬ 
igating stellar activity signals in RV datasets. 

An investigation of ways in which our framework could be 
improved would represent a natural extension to this work. Some 
possibilities worth studying include the following. 

(i) Developing a more realistic relationship between RV and an¬ 
cillary time series, especially the BIS time series - perhaps based 
on theoretical considerations, or perhaps through empirical means, 
e.g. by studying simulated data. Preliminary tests suggest the in¬ 
clusion of a curvature term (second derivative of the latent pro¬ 
cess) could be helpful for more accurately coupling the RV and BIS 
time series; though it would add significant algebraic complexity, it 
would only add one free parameter to the overall model. 

(ii) Performing more comprehensive studies of planet detection 
rates and false-positive rates under our framework. 

(iii) Extending the SOAP 2.0 tests to include plages and spot 
evolution. 

(iv) Investigation of a more physically-motivated GP covariance 
function, e.g. a function from the Matem class of covariance func¬ 
tions, which are differentiable only a finite number of times, and 
have the squared exponential function as a smooth limiting case 
(Rasmussen & Williams 2006). 

(v) Investigation of more physically-motivated priors for all 
model (hyper)parameters, with a view to computing Bayes factors 
and performing rigorous model comparison. In particular, this will 
require marginalisation over GP hyper-parameters, rather than fix¬ 
ing them at their MAP values. 

(vi) Using raw, high-resolution spectra to construct a quantity 
that is a more sensitive proxy to stellar activity than ones currently 
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Figure 12. GP model MAP fit to the publicly-available a Cen B data from D12. All four seasons, comprising some 459 observations, were fit simultaneously, 
i.e. using a single set of (hyper)parameters. The dots indicate D12’s data (with estimated errors, where applicable); the solid lines ai'e model posterior means, 
and the shaded regions denote itcr posterior uncertainty. 


in widespread use (BIS, log etc.). An artificial neural network 
might be useful for machine-learning such a quantity from a large 
dataset. 

(vii) Investigation of including a GP component to model cor¬ 
related instrumental noise, rather than forcing such noise to be ar¬ 
tificially absorbed by an additive white-noise component, as is the 
case in our current framework. 

In tandem with the above, as discussed in Section 4.4, we aim to 
present in the near future a far more detailed and rigorous analysis 
of the much-discussed a Cen B dataset. We would like to inves¬ 
tigate in which situations our fitted planet-free models lead to a 
signal with a 3.24 d period in the RV residuals (this should happen 
at least in some cases, since our framework should be able to re¬ 
produce the model of Dumusque et al. as a special case); we would 
like to undertake detailed studies of planetary detection limits us¬ 
ing both our GP framework and a model akin to that used by Du¬ 
musque et al., for synthetic data similar to the oCenB dataset; 


and we would like to perform Bayesian model comparison (planet 
vs. no-planet models) in order to draw definitive conclusions about 
whether the data really suggest the presence of a planetary signal. 
These analyses will form the focus of a separate paper. 
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Figure 13. Diagnostics of the ARV residuals for a Cen B, after subtracting our MAP GP model (stellar activity plus binary orbit). The top left panel contains 
the residuals plotted as a function of time the top right panel contains a histogram of the residuals (rms: 0.70 m s“^); and the bottom panel shows the 
normalised Lomb-Scargle periodogram of the residuals, along with false alarm probability thresholds. 



HJD - 2455140 


Figure 14. Demonstration that the predictions of a GP model can subsume those of D12’s model as a special case. The grey squares correspond to the 
publicly-available Q;CenB data from D12, after removing binary and long-term magnetic cycle contributions, while the black circles are predictions from 
D12’s published activity model, for the same times as the real observations. Draws from a GP model (a few examples of which are plotted) fitted to the 
noisy data can be found such that they essentially interpolate D12’s model predictions; for visual clarity, only one observing season is shown here, but the 
interpolation holds true for the other seasons as well. Indeed, it is straightforward (though not necessarily useful or desirable, and ignoring considerations of 
over-fitting) to find hyper-parameters that will force a GP model to agree exactly with D12’s activity model, for arbitrarily-many observing times. 
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APPENDIX A: COVARIANCE BETWEEN 
OBSERVATIONS AND DERIVATIVE OBSERVATIONS 


Note that some of the equations given below already appear, albeit 
without further explanation, in the main body of the paper. The 
aim of this appendix is to provide some further details to allow 
these covariance kernels and their derivatives to be implemented 
computationally. 

Suppose we have a Gaussian process G characterised by a co- 
variance function 7. Then the covariance between an observation 
of G at time ti, and an observation of its derivative G at time tj, is 
given by 


(G,dG)i 


7 "-’ "'{tipj) = 


(Al) 


where is used to denote the covariance between 

(non-derivative) observations of G at times ti and tj. Similarly, the 
covariance between two observations of G at times ti and tj is 


(dG.dG).^, ^ 

^ dt'dt^ 


(G.G) 


{t,t') 


(A2) 


For convenience, we present below the relevant expressions 
for ^(dG,dG) gjj, jpj. jjjg covariance functions con¬ 

sidered specifically in this paper. The above relations are, however, 
valid for any covariance function 7; as such, the two examples be¬ 
low serve as an illustration of how the relevant expressions can be 
derived for any covariance function. 


Al Squared-exponential covariance 


Using our ‘generalised squared-exponentiaT covariance function, 
the expression for the covariance between two observations of a 
process G at times t and t' is: 


7 


(G.G) 

se 


N 

{t, t') = ®^p 

i=l 


2A? 


(A3) 


the Pi’s control the relative amplitude of the N ^ 1 components 
with evolutionary time-scales A;. For the case = 1, this reduces 
to the standard square-exponential covariance function. 

Following equation (Al), the covariance between an observa¬ 
tion of G at time t and an observation of G at time t' is given by 


(G.dG) 


{t,t') 




A? 


exp 


2 A? 


and, by equation (A2), the covariance between an observation of G 
at time t' and an observation of G at time t is given by 

yi?’‘^°Ht',t) = -yi^’‘^°\t,t'). (A5) 

The covariance is symmetric, by definition, from which it follows 
that ,t) = 'yii^’‘^\t,t'). Finally, the covariance be¬ 

tween two observations of G at times t and t' is given by 

N 

=Ea' 

i=l 

(A6) 
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it - t'f 

exp 
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A2 Quasi-periodic covariance 


Using a quasi-periodic covariance function formed from a squared 
exponential kernel, the expression for the covariance between two 
observations of a process G at times t and t' is: 


= exp 


sin^ [7r(t — f')/P] {t — t')^ \ 
2 A| 2^ J ’ 


(A7) 


where P and Ap correspond to the period and length scale of the 
periodic component of the variations, and Ae is an evolutionary 
time-scale. While Ae has units of time, Ap is dimensionless, as it 
is relative to P. 

Defining cf> = 27r(f—f')/P, we then obtain the expressions for 
the covariance between derivative and non-derivative observations: 


= Pqp(Uf') 

and 


TT sin (j) 

2PXI 


t-t'' 


(A8) 


7i';’‘‘^Ht',t) = --fi';’‘^^\t,ty, (A9) 

by symmetry of the covariance, •yqp’‘^'^\t',t) = jqp'^''^\t,t'). 
Finally, the covariance between two observations of G at times t 
and t' is given by 

yif’‘^°\t,t')=y^q°’°\t,t') X 


TT^ sin^ 0 I Tv^coscj) psinp {t — t')^ ^ 1 

4p2^4 + p2^2 A^^ A| 
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